Towards Ad-Hoc GPU Acceleration Of Parallel Eigensystem Computations

نویسندگان

  • Michael T. Garba
  • Horacio González-Vélez
چکیده

This paper explores the early implementation of highperformance routines for the solution of multiple large Hermitian eigenvector and eigenvalue systems on a Graphics Processing Unit (GPU). We report a performance increase of up to two orders of magnitude over the original EISPACK routines with a NVIDIA Tesla C2050 GPU, potentially allowing an order of magnitude increase in the complexity or resolution of a neutron scattering modeling application.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

CU-Simulator: A Parallel Scalable Simulation Platform for Radio Channel in Wireless Sensor Networks

Due to the computational intensive nature, the current available WSN simulators, which are based on the traditional CPU computing architecture, cannot run in a linear scalability. In this paper, we propose and set up CU-Simulator, a parallel radio channel simulator to enhance the performance for simulating data packet transmission in WSNs using NVIDIA’s CUDA-enabled GPU parallel computing archi...

متن کامل

Fastplay-A Parallelization Model and Implementation of SMC on CUDA based GPU Cluster Architecture

We propose a four-tiered parallelization model for acceleration of the secure multiparty computation (SMC) on the CUDA based Graphic Processing Unit (GPU) cluster architecture. Specification layer is the top layer, which adopts the SFDL of Fairplay for specification of secure computations. The SHDL file generated by the SFDL compiler of Fairplay is used as inputs to the function layer, for whic...

متن کامل

GPU Acceleration of the Generalized Interpolation Material Point Method

This paper describes our experience rewriting a sequential particle-in-cell code so that its key computations are executed on a GPU. This code is well-suited to GPU acceleration, as it performs data-parallel operations on a regular grid. Key performance challenges are the need for global synchronization in mapping particles to grid nodes, and managing memory bandwidth to global memory. Performa...

متن کامل

Improving Inter-thread Data Sharing with GPU Caches

The massive amount of fine-grained parallelism exposed by a GPU program makes it difficult to exploit shared cache benefits even there is good program locality. The non deterministic feature of thread execution in the bulk synchronize parallel (BSP) model makes the situation even worse. Most prior work in exploiting GPU cache sharing focuses on regular applications that have linear memory acces...

متن کامل

High Performance Implementation of Fuzzy C-Means and Watershed Algorithms for MRI Segmentation

Image segmentation is one of the most common steps in digital image processing. The area many image segmentation algorithms (e.g., thresholding, edge detection, and region growing) employed for classifying a digital image into different segments. In this connection, finding a suitable algorithm for medical image segmentation is a challenging task due to mainly the noise, low contrast, and steep...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011